install.packages(dplyr)
install.packages(plotly)
install.packages(networkD3)
Sankey Diagram in R
Introduction :
Sankey diagram is a type of flow diagram where the width of the arrows is proportional to the flow quantity. It is often used to visualize the flow of resources or information between different entities. In this lab, we’ll see a couple of R packages to generate a Sankey diagram.
Installing and loading the required packages
library(plotly)
library(networkD3)
library(dplyr)
Importing data
<-read.csv("Sankeydata.csv")
dfhead(df)
id gender field personality
1 PT_1 Female Business Introverted
2 PT_2 Male Law Extroverted
3 PT_3 Male Science Introverted
4 PT_4 Female Art Introverted
5 PT_5 Female Business Extroverted
6 PT_6 Male Science Introverted
Transforming Data
<- df %>% group_by(personality, field) %>%
freq_table summarise(n = n())
freq_table
# A tibble: 8 × 3
# Groups: personality [2]
personality field n
<chr> <chr> <int>
1 Extroverted Art 6
2 Extroverted Business 17
3 Extroverted Law 9
4 Extroverted Science 7
5 Introverted Art 16
6 Introverted Business 15
7 Introverted Law 13
8 Introverted Science 17
Breaking down the terminology used
- Node: Nodes are your source and target points in a Sankey plot. They are represented by rectangles.
- Link: Links connect nodes, depicting the flow/transition of entities from source to target categories. Their thickness depends on the quantity or frequency shifting categories.
- Value: Values are the numerical values associated with links that indicate the frequency of entities moving from one category to another.
Create Nodes and Links data frames
<- data.frame(name = unique(c(as.character(freq_table$personality),as.character(freq_table$field))))
nodes
<- data.frame(source = match(freq_table$personality, nodes$name) - 1,target = match(freq_table$field, nodes$name) - 1,value = freq_table$n,stringsAsFactors = FALSE) links
1. Sankey Plot using Plotly
<- "rgba(0, 0, 0, 1)"
link_color
<-plot_ly(type = "sankey",orientation = "h",
p1node = list(label = nodes$name),
link = list(source = links$source,
target = links$target,
value = links$value,color = link_color)) %>%
layout(title = "Sankey Plot: Personality and Field")
library(webshot2)
::saveWidget(widget = p1, file = "p1.html")
htmlwidgetswebshot(url = "p1.html", file = "p1.png", delay = 5)
NOTE: Image of above output has been added due to memory issues.
2. Sankey Plot using networkD3
<- sankeyNetwork(Links = links,Nodes = nodes,Source = "source",Target = "target",Value = "value",
sankey_plot NodeID = "name",fontSize = 12,nodeWidth = 30)
sankey_plot
Understanding above created Sankey Plots
In these Sankey plots, we can see the distribution of extroverted and introverted personalities in different fields like business, science, law, etc. One of our key takeaways from this plot can be that we get to see if any one field is preferred by a certain personality type - for example, from the above chart, more introverted people can be seen opting for art, as compared to extroverted people.
Conclusion :
In conclusion, Sankey diagrams represent a transformative force in data visualization, transcending traditional boundaries to deliver profound insights. Their versatility, ability to uncover patterns, and capacity to communicate complex information make them indispensable tools for decision-makers across industries. As we navigate the ever-expanding landscape of data, Sankey diagrams stand as beacons, guiding us toward a clearer, more intuitive understanding of the intricate relationships shaping our world.